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The protein sequence database evolution by changes in 
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frame of translation, f ^"^^f ^ "^.^^^e frames of translation. The 
substitution matrices for f % "^^^^f J^res were computed in 
statistical significance %l]^l^lfJ^''°Jil^e database that preserve any 
the true database ani ^^uf fled versions ot ^^^^^^ databases 

potential codon bias. The compari.cn ^J^^^ relationships. We 

provides a very sensitive method for de ecti g^^^ ^^^^^^^^^ ^ ^^^^^^ 
find a weak but measurable ^elateaness v evolved from others 
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Marcotte E M; Pellegrxni M; Thompson M J; 
Yeates T 0; Eisenberg D 
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M.'d-.-ular Medicine, University of 0 :ili forma, Los 
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A census of protein repeats. 
Marcotte E M; Pellegrini M; Yeates T O; 
Eisenberg D 

Molecular Biology Institute, UCLA-DOl Ljb 
M-l-cular Medicine, Los Angeles, CA, t>-'-'-^ 
.^oqptlAL OF MOLECULAR BIOLOGY, (1 999 uct lo 
Journal code: JOV, ISSN: 0022-2B3G. 
ELGLAUE': United ?:ingdom 
■Jaurnal; Article; 
Engl ish 

r'ric^rity JC'Urnals, 
2 'J 0 0 0 1 

V^'^his'study, we analyzed all known protein sequences for repeatrna amino 
.;;id segments. Aithou.,h duplrcated sequence ^^^-^^''l';;^^^ ^^^^-^ 
Pirr^ins, ^u:-:aryotic oroteins are three tines more likely to ha /e internal 

repeats than prokaryotic pr atoms. After olusteim, t... ^ ^p 



(J'OURtlAL AFTICLE) 
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.:: proteins 



_:.,.„T^,.p s^am^nts mt-:) families, we find repeat _ 
:;:;2 ;rttl.; similarity with prokaryctic repeats, suggest ing m.cst repeats 
^2-se aftp^ the pr.:karyotic and eukaryoti= lineages aivergea. 
2;nseguentiy, prltein classes with the h^^ghest:.ncidence of repetitive 
^-."-ouen-es P-rform functions unique to eukaryotes. The frequent > 
distribution of the repeating units sh:,ws only weak length aependence, 
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imp! ica tiny i HiC-jiuDxria ^ ^^-ii ^u-^..-- ■ — ^ , ^ 

fn>'mation as the limiting mechan'sm underlying repeat tormation. The 
m-ehanism favors additional rape. its once an initial aaplicati:^n has been 
ia.-.~)rporatpd. Finally, we show triat repetitive sequences are favored that 
o^-ai^ain small and relatively water-soluble residu-s. We propose that 
er--.r-prr,n- r-peat e>:pansion all)ws repetitive pr^. terns to evolve more 
qui::kly than non- repeat -containing pr-teins. Copyr'ignt V:^'^6 A-ademic 
Press. 
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prctein funotion and n rotein-pr otem interactions r -om genome 

se':|uerices . 

AU Marcotte E M; Pellegrini M; Ng H L; Rice D W; 

Yeates T O; Eisenberg D . . , 

ri.M^A-Department of Enerqy Laboratory of Struccural Biology ar:a Molecular 
M^.:ucine, Cniversity ot California at Los Angeles, nos Angel-.., oA 
9:.:'^95-157iC CSA. 
NC POl GM 31299 (NIGMS) 

SO S''^IEt]CE, ^ 1^5 9 9 Jul 30; 285 \zAZc-) 7 51-3. 

Journal code: U J7 . ISSN: 0036-3075. 
CY United States 

DT Journal; Article; (JOURNAL ARTICLE) 
LA English 

FS Priority Journals; Cancer Journals 
EM 199910 
EW 19M91003 

AB A'-omputarionai method is proposed for inferring protein interactions from 
nerom^- segu-nces on the basis of the observation that some pairs or 
mrpracting proteins have homologs in another organism fusea into a single 
P-Ot^in chain. Searching sequences from many genomes revealeo 68U9^such 
pur^tive p-o- o In-protein interactions m Escherichia coli ana 45,5Cz m 
v-act t^a-y members of these pairs were confirmed as functicnally related; 
computational filtering further enriches for interactions. S:me proteins 
nave links to several other proteins; these coupled links appear to 
r^^present functional interactions such as complexes or pathv.-ays . 
Experimentally confirmed interacting pairs are documented in a E>atabase ot 
Interacting Proteins. 
CT Check Tags: Human; Support, IJon-U.S. Gov't; Support, U.o, Gov t, 
r].;)n-P.H.S. ; Support, U.S. Gov't, P.H.S. 
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Tl Transf'Toteomic evidence of a loop-deletion mecnanisiiu ic^r enhancing protein 
thermostability [piut'.lished erratum appears in J Mol Biol 1999 Oct 
1;292 (4) :946j . 

AU Thompson M J; Eisenberg D 

Co University of Cdlifornla Los Anqeles, Los Angeles, CA 90095-1570, USA. 
SO JOURUAL OF MOLECULA?. BIOLOGY, (1999 Jul 9) 290 [2) 595-604. 

Jcurnal code: JbV. ISSN: 0022-2936. 
CY r^UGLAIID: Unioed ta.ngdcm 
DT Uourncl; Zirticle; (JOURNAL ARTICLE) 
LA English 

F3 Cancer Journals; Priority Journals 
EM 199910 

AB Understanding the molecular determinants cf protein ohermost abi lity is of 
oheoretical and practical importance. While numerous determinants have 
been suggested, no m^Dlc^cular feature has been judged cf paramount 
Lmporoance, with the possible exception of ^on-pair :vrtwor}:s. The 
difficulty in identifying the main determinants m.ay have been the limited 
structural information available on the thermostable proteins. Recently 
the complete genomes for mesophilic, theritiophilic and hyperthermophilic 
organisms have been sequenced, vastly improving the p-jtential for 
uncovering general trends in sequence and structure evolution related to 
thermostability and, thus, for isolating the more important determinants. 
From a comparative analysis of 20 complete genomes, we find a trend 
towards shortened thermophilic proteins relative to their nesophilic 
homologs. Moreover, sequence alignments to proteins cf known 
structure indicate that Thermophilic sequences are mere likely than their 
mesophilic homologs to have deletions in exposed loop' regions. The new 
genomes offer enough comparable sequences to cimpute meaningful statistics 
that point to loop deletion as a general evolutionary strategy for 
increasing thermostability. Copyright 1999 A-ademic Press. 
CT Check Tags: Comparative Study; Support, U.S. Gov't, Non-P.H.S.; Support, 
U.S. Gov't, P.H.S. 
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t: a fast al'jori.thm f'-^r ■jo^nome-wide analysis proteins with repeated 
sequences . 

Ai: Pellegrinx M; Marco tte E M; Yeates T O 

o;- Mr.iecuiar Biology Institute and UCLA-POE Laboratory of Structural Biology 
an-J Molecular Medi;.ine, University of California, Los Angeles, 90095-1570, 
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[■T J:-arna^; Article; oJ'^URNAL /ARTICLE; 

LA English 

FS Priority ■Journals 

EM 199910 

FW 199910 02 

AB present a fast -.iljorithn to search for repeating fragments withm 

pr^'/tein sequences. Tne technique is based Cin an extension of the 
Srr,ith-Wat^-rman alg'^rithm that allo\;s the calculation of sub-optimal 
alignments of a sequence against itself. Wesre able to estimate 
the statistical si'"jni f ic ance of ali sub-optimal alignment 

o!^,vr-^s/Ke also raoidiy determine the length of the repeating fragment and 
rh^'r number of tim^es it is found m a sequence. The technique is applied to 
secuences in the Swissprot database, and Id 16 complete genomes. We^find 
that eukaryotic proteins contain more internal repeats than those of 
ru'okaryotic and archael organisms. The finding that ll't cf yeast sequences 
and 18^ cf tne known human sequences cijntain detectable repeats emphasizes 
the importance cf internal duplication in protein evolution. 
CT Check Tags: Human; Support, Non-U. 3. Gov^t; Support, U.S. Gov't, 
t]on-P.H.S.; Support, U.S. Gov't, P.H.S. 
^Algorithms 
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Tl Predicting structures for genome p.rotems. 
AU Fischer L'; Eisenberg D 

Faculty of Natural Science, Department of Math and Confiuter Science, 

Beer-Sheva, 84015, Israel. . df ischer0cs . bqu . ac . il 
CO CURRENT f'PIN'ION IN STRUCTUF'AL BIOLOGY, (i:^99 Apr) 9 (2) 208-11. Ref: 22 

Journal code: B9V. ISSN: 0959-440X. 
CY ENGLAND: United Kingdom 
DT Journal; Article; (JOURNAL ARTICLE) 

General Peview; (REVIEW; 

'. ? E V I EW , T U T Or , I AL ; 
LA English 
F3 Priority Journals 
EM 199910 

AB Assigning three-dimensional protein fclds to genome sequences is essential 
t.:. undprstanding protein functicn. Although experimental three-dimensional 



.^,,v-p^ -.^.^ ,^.Tr^^onM^/ avail a^--lp for onlv a verv small fraction of these 
sequences, ccmputat ional fold assignment is able to assign folds to ..O-JO;. 
of the sequences in various gencrnes. This percentage varies dependin^f on 
the rarticular organism under analysis, on the sensitivities of the 
mpth'^'ds used and on the number of experimental structures available at the 
time the assignment is carried cat. The fraction of assignable sequences 
is currently increasing at an annual rate oi roughly 18:'-. If this ra*:e is 
sustained throughout the coming years, three-dimensional computation ;il 
models for more'tnan half ol the genome sequences may be available by the 
year 200 3. 
CT Check Tags: Fiuman 
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Crystalli zaticn 
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TI Assigning protein functions by comn'arative genome analysis: p^rotem 

priylogenetic profiles . 
AU Pellegrini M; Marcotte E M; Thompson M J; 

Eisenberg D; Yeates T O 

C3 Molecular &i<:vbjgy Institute and Departments of Energy Laboratory c-f 

Structural BiC'logy and M-jleirular Medicine, University of California, Los 

SO PROCEEDINGS OF^THE NATIONAL ' ACADEMY OF SCIENCES OF THE UNITED STATES OF 

AMERICA, (1999 Apr 13) 96 ( 8 ) 4285-8. 

Journal code: PV2: . ISSN: 00.:7-8424. 
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DT Journal; Article; (JOURNAL ARTICLE) 
LA English 

F3 Priority Journals; Cancer Journals 
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AB Determining protein functions from, genomic sequences is a central gc^al ot 
bioinformatics . We present a method based on the assumption that proteins 
that function together in a pathway c^r structural complex are likely to 
evolve in a correlated fashion. During evolution, all such functionally 
linked proteins tend to be either preserved or eliminated in a new 
species. We describe this property of con-elated evcluti^n by 
charactericing each protein by its phylogenetic profile, a string that 
encodes the presence or absence of a protein in every known (genome. We 
show that proteins having matching or similar profiles strongly teno to be 
fun-:t ionally linked. This method of phylogenetic profiling allows us to 
oreoict the function of unchara ster i zed proteins. 
C^T Check Tags: Comparative Study; Support, Non-U. S. G:-v't; Support, U.S. 
'^ov't, Non-P.H.S.; Support, U.S. Gov't, P.H.S. 
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Fol-l a.^^ignmerits for ammo acid -equences 3f tne CASP2 experiment, 
p . n y-' vi<=^c^f^T [i; Weiss F.; Exsenberg D 

i::CLA-D-e' Laboratory of Structural Biclcay and Mole::urar Medicine 
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F::r,TE::i:-, ^0)97) S.prl 1 11^-22. 
J.jrnal c jde : PTS. IirSN: 08'-7-2b-5. 
rv ['n:te:i States 

D7 r--.urnal; Article; 'J'/UFUAL ARTICPE) 

LA Er.Tiisn 

F. i Priority ..'cur rials 
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AB r].^w and newly extend^-d nethods for fold assignment were tested for their 
aoi^ir^-3 to assign tolds to ammo aci'i oarget sequences of unknovjn 
r^ire-dirr.^nsional structure. These tarj-t sequences, released through the 
(-HSP2 e:q>-^riiaent, are not obviously rel.-ed to any sequence of known 
r^.rpp-dim^nsional -:D) structure. We assigned 3L rolds to target sequences 
a;,o"ti_.d"rhese prediction, witn .::ASF:> V^efore their 3L st ruct ures were ^ 
^..i^.^s^d Th^ methods tested were (1) Err/i ronn.enta 1 31.) proiries or Bowie 
an.;'c;ii;agu;s iBowie, J.G., Luthy, R. , Eisenberg, D. Science 253:164-170, 

,;) variation of triis i.s t'ir.rii.-a l xi --^.c x >r.a^ .^...^ , - , ^ 

H.-20i — -dim-nsionai sequencc-scructure sabstitution m.^itrix ct Rice and 
H:i-ir.bei-:i [Pi-ce, !■ . , Eisent-ro-, D.J. Mr ■ . Biol. .?.:-7 : l;i2G- 103'^ 19^/]; ana 
C-.'-fj.:. "■.iji---nce Ti^rive-i F'r Dp-i-rt y nieth'jds of Fi.s:.hei und Eise;nberg 
rFi^-hArL'''rEisenc-^rqr oAprot! S^i. ^):947-455, 199..]. When the 3D 

. , r-^T*^^^K--^ " ^ ^'it' .-.iji" p"^f'dict ions were 
-c- ru-'^t ur^-'-s C't tne sequerict^i w-z^re iei.eao^.^, - i . . u r 

Evaluated. Of these 17, we assigned high probabilities to 11, of which 9 
w^r^ corr'-^-^t Five of these correct predictions were of known 
structures similar to the targets and four of these were of new tolds. ^The 
^valuation demonstrated that oui methods were erfective m assigning the 
proper fnid to more than half of the CA3P2 oargets with .mown folds (5/9) 
■■nd also w-re able t.-, detect half of tne sequences that corresponded to no 
known folds {4/8). Even when the correct fold is assigned to a sequence, 
proper alignment of the sequence to the structure remains a 
challenae. Our methods were able to produce accurate alignments 
(< 1.^' mean residue shift error from the structural alignment) 
f.->r four'of the taroets, including the particularly difficult 
alignment (only 1- residue identity in the structurally 
aligned regions) of the f er rochel atase sequence to the fola Df a 
r.eripiasmi binding protein. 
CT Check Tags: Support, U.S. L.ov't, Non-P.H.S. 
/\mino Acid Sequence 
Ferrochelatase : CM, chemistry 
Molecular Sequence Data 
*Protein Folding 
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AB r----::ently there has been an explosion of methods for fold recognition. 
TheoO iTLOthjds seek tc align a piotein sequence to a 

t .^v-ee-dimensional structure and measure the compatibility of the sequence 
r.'-j the structure. In this v;ork, we present a benchmark tt- assess the 
I r f 'lU'mance oi. such n.ethods. The bencnmar^ cjnsists of a set of prc-tein 
sequences matched t^y superp-jsit i.-an to known structures. This set covers a 

, , ; .1 - - ^ v -t- - -f- - -i ' i o - ,-■} i 1 -■ ■ r] ,ii o n-^ .~t t- h i p r i r i.-J "i t Y 
■,\.l'A*zi l-zi.i'^'r JL ^ i_ c- i. i c*jLi_L ^ J. '.^ .J , ijii'^ ^ . I.,.. ^ .LL.^ . ...^..j ^ , . — 

ir:siqnif L-Eint sequence s imo lar i. t >' . To denionst rate the usefulness of this 
nenchmark, we apply it here to comoaie different f cld-re togni t ion methods 
d'f-veloped tnrough the years in -^ur group as v;ell as several 
sequence-sequence sutstitution matrices. The results shc-w th.r.t 
"global-local" alignments are superior to either Ijcal or global 
alignments. The most effective sequence-sequence mat::hing matrix 
IS the Gonnet table. The best performance overall is obtaineo by a m.ethod 
which combines the 3r'-:D prt.files of Bowie et al. with a substitution 
matrix and takes iritc. acccur.t residue pairwise interactions. 
CT '■;neck Tags: CC'mparat:ve Study; Support, Mon-U.S. G^v't; 3uppc>rt, U.S. 
Sc^v't, Non-P.H.S. 
*Amino Acid Sequence 
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TI VERIFY3D: assessment of protein models with three-dimLensional profiles. 
AU Eisenberg D; Luthy R; Bowie J U 

CS Laboratory of Structural Biology and Molecular Medicine, University of 
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Co fin., vei sit y r'al i f ..-rnia, Los Angeles-Department of Energy Laooratory of 
;^:.r\i.::r ural Ed-.) b^gy ..nd Mo].e':-ular Medicine, Molecular Biology Institute, 
Un:.ver3ity jf j 1 1 f : rnia , he s Angeles, Eox '-if 1570, Los An(^(sles, CA 
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AE A crucial st.ep in e;-:plc. i.t Lnv: "h-^ i nf r-rmat ion inherent in genome sequences 
i~ L'.) is^rqn e.-.(:ci pro^t^cir. sovfueiice if;: r oree- aimens ic^nal fcdd and 
1.1 .r.U-Cil" i:un:::ion. Here v;-- descrioe fc'ld .;issignmcnt f^.r the t:>roteins 

-d--! T'; "h-^ or^ia i 1 c;enc'::ie ct My-::;:'pO. asr^a f:ren it ad ium . T^'C assianment was 
■:' -J r r L*-:! '"'i it I'V 'jur ■.;onif at er i ex -jei [ nttp' : / / vjv/w . d^ae- 

mLu.ucla.edu/people/frsvr/ frsvr. html), whi_^:h assigns folds to amino acid 
s-guences by ■:-:)mpa.r ing se^guonce- ler ived p red i. ::t i-: ns with knowri structures. 
Ot t'ttal :'f 4*:;E prctoin 'jREs , 10/ (^Dt) c^n c^e assigned a known 

v.; r V'';-!:! f-j.rd wiuci h^.c^n ■.:o:if itien ::e , a:^ c;r C'S, al I'j.^tea vvLtn tests on }:nown 
i^T rj.^r ur-s . 'j: t.nese sejuen^es, 75 (it'ic snow entugn se:(uence similarity 
X.O pr-"iteiri3 :f Pri':a';ri struat..^re tnat ihey c.^n als':' be dere::ted i:>y 
"raiitior.al seguenL;e-SG .pa tnco Cjmpar:scn metntds. That is, tne difference 
of 23 sequences (CO) are assignaDie by the se queri ae-structure method of 
the server cut not by current sequence- sequence methods. Cf the remaining 
7:?.- of sequences m the genome, 18^ belcng to membrane proteins and the 
r-jmaming 60^ cannct be assigne:! eitner because tnese sequences cc»rrespond 
to no presently known fol:i or because ol insensi t ivity of the method. At 
the current rate of determination of nev; folds by x-ray and NMR methods, 
extrapolation suggests that folds v;iil i:>e assigned to m3st soluble 
proteins m the ne>:t decade. 
CT Algorithms 

Amino Acid Sequence 
*Bacterial Proteins: CH, chemistry 
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n--.'-.l-'ri (--: oreiVrtiriq seooiiiary .structure. Usirio on^y SL:.^;^e sequence 

"tvic :-,Kt^.:.d .i -h-'ives a tr.ree-:5ta1 e accuracy ot t ov-n a i^atacise 
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d<.^.i, thi:: - - - , • - . ^ ^ 

r,^- r-^'-m-i.-.g'T:is : r ;)t:e ins . Tni,? approacn is ni'-ie a:iH-:ao.-e tj 

[u-p^crAon and :ess l.::ely to overi-.arn sp>.citi::s r . i.taset tnan 
bov" m.:^thr.ds saoh as neural networks. It is also conceptually simpler and 
l.-ss -.oripotationally (o:'Sr:Ly. We also mt r^-) iuse a n-.-vei method tor 
r^--r.r--3^-'nt inq and moc rpo rat ing nalt ipie-sequonce alignment 

in formation 'within •:ho predioti^.n aigoritnrn, achieving 7S^, accuracy over a 
.. .... , ■ T>,^c: a'-^'-^nmnli.shed bv (■:reatinq a 

;.-r.;ri-,l Tod^d or The evQiut; onarily dori-.oi correlations between 
^..trprns of aniLOO acid suos t Ltut. i on and 1 o:al protein structure. Tnis ^ 

A' ^' . ^ i. ^ . ^ .. - r ^ > ■/ Mr"- -■)■■■■ r*o'OL"d " ub s ■ u t' i on schemata / wnicn 
pLnAltliAiAlly eAti,rthe 3t;ruc-.ure-based r-i:erogeiiei':y m the 
0_-, ,,1,,.^^,-,.-, ami---' anid subst :^uti ::r.s four.d in alignments of 
h.-.r-,-,- r qniip ;-.rof:^ins. ine mcoei -i ^pt^:..x^-u - ^ -o.. - --■ ^ 

;.;.v.T.i Cnn Vhe laatuai mformatior: Letween t;he set ^f :;rM.3mata and the ^ 
AAAA cfee::cndary stru. f.re.. ::nl.Ke "e.pert h-^^r-^^yy ^'f ^^AS AA' 
ar.M-....-h has teen de-itnctr :tea tc^ wcrk well e-ver large .a.ase,.c,. v....K.e 

rV^ ,-,-,rj.- neLral network algorithn.-, thi- .apprcacn i.:; pr.ysxcocnemLCaiiy 

f.-'r'cr-A-tinq one -dimensional strur.tural features and our previously 
d-tOlrp^d m-rhod fcr tertiary structure recognition all share a coxmon 
^v/--:an nrobabilistic basis. This consiste.ncy starkly contrasts witn the 
hybrio and ad h^.c nature ot metnoas lo.^l u^.-.- ...ul^ . o. .1...- ..^^..^ 
j-,r.,-^r^l- v<=*ars. _ 
CT Checii'Taqs: Support, rion-U.S. Gcv't; Support, U.S. Gov't, Uon-P.H.S.; 
Support, U.S. Gov't; P.H.S. 
Algorithms 

Amino Acids: CH , chemistry 

^ B a ye s The <:■ r erri 

Chemistry, Physical 
♦Evolution, Molecular 
*Protein Structure, Secondary 

CN C' -Aim no Acids) 

L45 AUSWEB 17 OF 30 MEDLINE 
AU ^7s8:j-r-01 KEDLIL'E 
L)N ^-^7sRr)s01 

71 A :.L-1L substit:uti..n :natr:x tor prctein o:.id recogni t; a ■: n that mcuaes 

|..t^-:jicted seconoary strucrure c-f t he segu-i.^nc^- . 
AU Fi'-e L) W; Eisenberg D , ^. • ■ ■ 

03 ecLA-L'OE Laboratory ot St:a:.:turB] BicJogy and Molecular Me.:^icine, 

M.re::uiar Biology Institute, UCLA, Los Angeles, CA 9cU ^^S- 1 5-7 0 , USA. 
NC i,M:.)7ie.5 iNIGMSj 

SO A:"]PNAL of MOLECULAF BIOL-jGY, :1 997 Apr 11) 2^7 ,4) iO.A:-3&. 

Ci.urnal code: CCV. A-SN: 00LL-.S3b. 
■:Y EN'^LAN'D: Unite i Kingoorn 
DT Journal; Article; ( AjUBNAL ARtISLl) 
LA English 

:ic.rlty J-iurnals; C3n:;er Journals 



EM A,^97 07 

EW '-i 9 7 1] 7 0 . „ J ^ 

AE Ind.rotein foli recognition, a pr:te ammD acid sequen- is compared to a 



iibrciT'' of ropresfifit cii" i ve tclis of. kncv/n v iLict urfr id*?riti ty ci 
struct' iral hc^n.. :) 1 c^q . In cases where *:he pr^^be and its homoloq have clear 
.■■equen'::e sini.i .1 -iir i ^ y , rad.. r loriai re^^idue scbst it ut i on matrices have been 
csed tL- predi':-t trie structural similarity. In cases where tne pr<jbe is 
seciuentially distant from its h-jmoleg, wf^ have de\'»il')ped a (7 x : x i x 7 
X jT'-lL' .?uh:3t itut len m<-;trix (calle^i Hib^), ca b:',.l . it ed from a database 
ijf 1 1 stiUL'tural pari 3. Memtiers of e lo.h pair snare a similar tcdd, but 
nave sequence identrty iess t nan : 0 ^. . Eacn probe :-'^-qaence cositien is 
i.letine'-i by <ane of seven residue olaos-zS uro.i three se^-;cndary sti-ucture 
-'lasses. KaoY. horn-: b^qc-us tolci p-slrij^n is Liefine^i by one of seven r^^sidue 
ciasses, three se':'':'ndary structure ■.Masse;E, and two lv:.rial classes. Thus 

he m<r^:i:^ix is f i -d ..niens...i:'nri 1 or,o ^rr; ,-i i n ■■ / x 7 x x i ■= : : 2 

elements sr 3['-lL' sc':jres. The first step in assignin; a z^rooo sequence to 
its nr.jmC'i ■: g'jus £olu lo tiie predi :ti'jn C) i tne t:\ree-.:-t a te (nelix, st::"ana, 
■0:11) ec':'n':la r V structijre C'f the pr-^be; here i^/e '.re;.' - r.e r'Ti^flle bas*~d 
neural netw^jrc ru'eoL^-t icci of secc-nd^ry ssrucoure ■sHs' oioiiram. 'Idien a 
:-lynamr-s pu o :fr a:^:in g alqcrithrn uses "ne H : ?. rnarrdx , align th*- 
probe sequence with structures in a repr^rs'-ntat i ve l.dd lir-rary. to t.est 
rYi*:: e 1: 1 est 1 veness '^f the H7p7 matri:-. a cria 1 ien ;^ iriLh : :-i:i class 'dis^'eise, 
arid cross-va 1 idateo bencnmark assessment is use'i to :':mpare the H3?: 
Kiatrrx to the GOtinE^T, PAMitO, BLliSjMoJ ana a s-'j-jndary str^^cture onlj^^ 
sur'St Ltut i on matrix. Foi distantly related sequences *: he H3?2 matrix 
u>;:'reets m-jre riomC'^'jgcus structUL-s nt h-gl'-jr r 1 i ab id i t i e s tnari -io ' hese 
■ ;-tner substitution matric-^s, oas-rU jx s^r.-itivity versus sr-e ::i f i - it y plots 
...u.. 7E::;7-;t?rl7 p] r^fsO . The added sffuacy :: tno H3s2 :r:atrix ^risas from 
Its rnrc'rm.at ion c-n the s: a t ist i '':al ore f e r-rices f'Sr v-.-isbaus 
j^qijonce-st rusturo envir'~nment combinat mi-ns from very distantly related 
pr-_.te:.ns. It introduces the predicted sectniary structure infcrmati.jn from 
a sequence into b^jld rec^aqnition in .-^ st. aristl^al v;ay that nDrmalizes the 
inherent correlations bet\jeen resicue typ^-: , se:;ond:iry structure and 
s 1 v*:5 n t a c c e s s ibc .1 i t v . 
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AB BACKGR'jUND: For genome sequencing projects tc achieve their full impact on 
biology and medisine, each protein sequence n.ust ce Identified with its 
three-dimensional structure. Fold assignment meznods {also called profile 
and threading methods) attempt to assign sequences to known protein folds 
by computing the compatibility of sequence to fold. PESULTS: We have 



e-'.Ltnatiiu pr^ji^x^rj iU*-- u^i^,-^^ ^-^x. .^w^ ...^ -.^ ^ ^ . . ^ , .^..-v 

:r:ructurai similarity but low sequence similarity to sequence prctes. Our 
extension cc-mbines sequence sui.s 1 1 1 ut i on tables with structural properties 
rr, form a L^r-mbined p::-ofiie. The structural pr-operties used m this study 
include distances betv;een residues, exposed areas, areas buried by polar 
atoms, and properties C'f the oriqinal three-dimensional profile method. We 
--.[■■mpared the r^erformance ot tn»:'se combined profiles with different, 
s-quence matrices and witn the s>riqinal tnree -oimens ional profile method. 
T,-, Get ermine the ot'timial qap' p^enalties and weights .^sed with tnese 
r.rc'files, we employe! a genetic alqc-rithm. Tne performance of these 
combined profiles was tested by cross valLdaticii using independent test 

criLni:!.:; .:ets. C jNCLUJ irii: : The:.e stjriie^; show that the combined 
pu-ofiles perform r^eoner tnan p-r: files based on either structural or 
sequence information aione. 
CT ''"\h'=:ck Tags: Anim.al ; rluman; i"-urpcrt, Non-'J.S. Gov't; Support, U.S. Gov't, 
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TI Predicting sC'i.vent ac(.:ess ibilit y : higher accuracy usirig Bayesian 

s^a^istics and 'jptirnized residue substitution classes. 
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AB We mtroauce a novel Bayesian prC'babiiistic method f^jr piredicting tne 

solvent accessibilities of amino acid residues in gin'bular proteins. Using 
single sequen::e data, this methcd ucnicvos prediction accuracies higher 
than pireviously published methods. Substantially imp-ioved 
predj ':t iC'ns-comparable t.o the nighest accuracies repc^rted in the 
literature to date-are obtained r^y representing alignments of 
the example proteins and their nomologs as strings residue substitution 
classes, depending on the side chain types observed at each 
alignment position. These results demonstrate the app'li ::abili ty cf 
this relatively simple Bayesian approach to structure prediction and 
illustrate the utility of the classification methodo' logy previously 
developed to extract information from aligned sets C'f 
structurally related proteins. 
■.^ i '...;ie'..n^ lags: .iiupport, i^jon-o.o. 'jov l, oup^poru, u.a. o'^'v l., r.n.o. 
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T: C-r:::'t. ructing amirv:.' acid residue substitution classes rriaxirnaliy maicative 

oi IC'Cai protein structure. 
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An Us:nq an ^n f orn-Lat ron the-^retrc 1 orma lisin, v;e opt.:mise classes of diarno 

acid substitution to be maximally indicative of local protein structure. 
Our St atist ically-derive-i classes are Loosely identifiable with the 
heuristic constructions found in previously [aiblished work. However , ^ while 
t^ps^ :^^her methods provioe a mere rigio idealization C'f phy sxcochemically 
c-'nstrained residue substitution, our classes provide substan^ ially more 
s'.ru':tural information wirn many fewer oarum-ters. Moreover, "hese 
substitution classes are consistent witn tne ]3ara iigmat ic view c^f the 
sequ^ nce-t r-structu!:'r i - i 3 c I'^iis ni}j in .j i -jb.a L.^r p. i.....^. w..^>_-.. x.^.^^o ^..^^ 
the three-cimensional architecture is predominantly determined t^y the 
arrangement of hydroF'hotic and polar side :h.T:ins with weak constraints on 
the actual amino aci'i identities. More specific cc^nst raint s are impcsed on 
tne n-la'-ement of prolines, glycines, and rne charged residues. These 
substitution classes have been used m nighly acsurate predictions of 
r^-siiiue solvent a ::ce ss i'r-i 1 i t y . Iney '::ould al.::0 l^e usesl in tne 
1 -ientif ication of nomologous prcteins, tne c :.)nst ruccion and refinement of 
niultiple sequence alignments, ar:l as a moans jf coniensmg zin^ 
codifying the inf orniat i'-.^n in multiple sequence alignments for 
secondary structure preoicticn and tertiary rold recognition. 
CT Check Tags: Supper t / Non-U . S . Gov't; Support, U.S. Gov't, P.H.S. 
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AB With the advent ^r-f genome sequencing projects, the ammo acia sequences o: 
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cn':usanjiy ol paotc^.i-^ ar^ u^^x-.u^.i^^.^ c w ^ ^ ^ .... ^ , --^ i~ - ■ - - - 

seouences must be .identified with it.- f ir.cvi and .ts ? -dimen? i ona 1 
itructuce f-jr us to '^ain a full unde rst ar.d .ru: of zh- niideoul^ir bi rdoay of 
organisrr.s. To meet ^bis challenge; n-w metho :s are i)einq -ievr,! :^pe for 
f.-dd recogni.t .ion, the computa t ic-n a L assignment ot n-wly deteimineii ammo 
i.--d sequences t :> ^-dimensional pr-t^-in structures. Th-.se me-hods start 
wi*.n a library of Known 3- iimens iona . target protein structures. The new 
pri-t.e sequence is tnen aligned ^.o ea^/n tar^je* pr'-}tevn structure 
in tte library and tne compatibility of the .-eq^.Tnce fi.r tnat structure is 
soared. If a target structure is found to na-/e :i sivjni : icant 1 y high 
--)mpatibilit y score, it is assumed tnat t: ne pr-ji^^ se(iU':-nse f j - :is m much 
tn..- same way oS tne t.. ir::et stru;jtu:e. T:i.- f j -d :.:m: nt 1 1 ;= ss imp 1 1 :.;ns s^f tnis 
apF roach are that many different s^-quenses f --1 :i m simi^-ar ways and there 
is a relatively nigh c-r oioaoil it y tnat a iiew .-. et{.j'-'nc-: pO'Ssesses a 
previously tbcerved f':ld. We review -various ^ipr r ■tacnes to fo-i recognition 
and break dtd-jn the process int-o it>^ main steps: .-reitijn of -i l:LLrary of 
target folds; represent at .i : n ol tne f::lds; alignment ot the 
pit-be sequence to a ^arget fold us.nj a se qu-n te-tc -st ruct ure 
ctimp^atib] lit y sct»ring function; an:i assessment td^ = ign 1 1 i can oe of 
comroatibility . We emphasize that even thc-ujs tn::s new field tf fc^^d 
r---ognition has made raoid progress, technical oroclems remdin to oe 
solved m most of th^. steps. Stanaari 1: enc!:im t r ks may help identify the 
pr-blem steps and find soluti:ns t the prociem^:. 
CT Algorithms 

Computer Graphics 

Databases , Factual 
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Tl The three-dimensicriai profile methoa using resi'.iue iprefeience as a 

continuous function of residue environment. 
AU Zhang K 7; Eisenberg D 

CS UCLA-DOE Laboratory of Structural Biology and Kolecular Medicine 
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AB In the 2-di mensi oiiul profile methc'd, the cc^i^F-a t : bi ii t y c.l an amine aci^^ 
sequence fc^r a given protein structure is S':tu'e'-i as the sumi c-r the 
r references of the residues for their environments m the Si' structure. In 
the original method {Bowie JU, Luthy R; Eisenberg 1991, Science 
^53:164-17(0/ residue environments v;ere quantizi'-d into l^- discrete 
^^-nvironmental classes. Here, arrino acb:: residue pr^.ferences .^re expressed 
-js a continuous function t-f environnent a 1 viriarjles (residue area t-uried 
■:ni fractional area buried by polar attrts) . Thi.-; c.-.nt muL^us representation 
tf resiJue preferences, enpr-sse^i as a Fo\:r:er .-;er.es, ciV'Diiis the abrupt 
<::hange of preferense of residues in slightly different en vi r : ninent s , as 
.encountered in -he .jrigmal method with its 13 lis ;rete envi i cnment al 
-lasses. When compared with ^.ne discrete : 8- class :;epr es-ntat ion of 
residue environments, this continuous --Z pr^ file i..-- founu tc* t:-e rr.ore 
sensitive m identifying sequences that f c id into *.ne P'r::fi-.od structure 
rut share with it little sequence identicy. The continuous ;M) prcfiie is 
■ilso less sensitive to erro^rs in -nvir : nm^ntai var^ziitdes thai: is the 
■liscrete 30 profile. The :ontinuous 3D pr-file tan also be used to detect 
wr.-.ng folds or incorrectly mDdeled segments in an otherwise (.:trrect 
structure, as could the discrete iD profile (Luthy R, Bowie JU, Eisenberg 
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r^, 1992, Nature 9 9 : 8 3- 8 S ) , Morer.vor, the p)rogress ot structure 
II: r ^rovement dur'inq atonic ret irif^nierit can also i:)e rrioni tor^Mi by exaniining 
t h'.- pri:>file sC'jres in a m jvj ng-v;in': i:iW scan. F^cnally, by ci^-fining a 
iunct i^.nal f'oriti tC'r profile scores, v;e op^en the v;ay to profile atomic 
1 e t inement in which an atC'inic stru.:.:t ure adjusts to firc^duce r(;sidue 
environments more conpat iL>le with ^:he prct.ein side chains. 
CT Ch-ck Tags: Animril; ^.lupport, 'J . ;.3 . oo^/'t, Mon-P,H.S.; buppoi^t, U.S. Gov't, 
b . H . . 
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Tl An evolutionary approacn to foli:.n.j smai^ al[)ha-heiical proteins that uses 

sequence information ami an empiri::ai guiding fitness functic^n. 
AU Bowie J U; Eisenberg D 

CS Department of Chemistry and Biochemistry, University of Califo-rnia, Los 
Angeles . . 
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AB Three short protein sequences have been guided by cc^mputer to folds 
resembling their crystal structures. Initially, peptide fragment 
conformations ranging in si::e from. 9 to 25 residues were selected from a 
daiabase of i-:riO'Wn protein structures. A tragment was selected if it was 
compatible witn a segment of the sequence tc be folded, as judged by 
three-dimensional profile scores. By linking :he selected fragment 
c :'nf ormat ions togetner, nundreds of :,rial stEucLureo were generated of the 
same length and se.:iuence as the protein to tie folded. These starting trial 
structures were then improved by an evolutionary algorithm. Selection 
pressure for improving the structures was provided i.)y an energy function 
tfiat was designed to guide the -onf ormatic-nal sear::! procedure toward the 
:'-.rrec^, stru'.:ture. We find that by e^/olutlon of only 400 structures for 
fewer than 1400 generations, the overall fol'i of s-jme small helical 
proteins can be computed from the sequence, with de\^iations from observed 
structures of 2.5-4.0 A for C alpna atjms. 
CT Check Tags: Comparative Study; Supp.^jrt, N:)n-J.S. Cjv't; Support, U.S. 
C-.v'-, Non-P.H.S.; Support, U.S. C:v't, P.H.S. 

Algorithms 
*Ainino Acid Sequence 
^Evolution 

[■■O'dels, Cenetic 

Models, Molecular 



Mutation 
*Protein Conformation 
*Protein Folding 
*Protein Structure, Secondary 
*Proteins: CH, chemistry 

Proteins : GE , genetics 

Fecombi nation, Genetic 

Statistics 
CN 0 (Proteins) 



L'15 AIISWER 26 OF 30 MEDLINE 
AN 9j1oj7 00 MEDLirJ^: 
CN 9;;.loS7 00 

TT Three-dinensic^nal profiles from residue-pair prefer enct^s : i( 

of sequences with beta /alpha-t-arref fold. 
P'^ VJilmanns M; Eisenberg D 
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AB The three-dimensicrial profile method expresses the three-dim.ensional 

S1-. ructure of a protein as a table, tiie profile, which represents the local 
environment of each residue. The score of an amino acid sequence, 
aligned with the three-dimensional profile, reflects its 

rcrapitibility with the profiled strccture. In the original implementation, 
each local environment was characterized by its polarity, the _ area buried 
oi Its side chain, and its secondary structure. Here we describe a 
modified three-dimensional profile algorithm that characterizes the local 
environment m terms of the statistical preferences of the profiled 
residue for neighbors of specific residue types, main-chain conformations, 
c>r secondary structure. Combined profiles of the original an^i the three 
new types were tested on beta/alpha-barrel protein structures. The method 
identified the following enzymes of unknown three-dimensional structure as 
probable beta/alpha-barrels, all of which catalyze reactions in the 
biosynthesis of aromatic amino acids: anthranilate 

onosphoribosyltransferase (trpD), glutamine amidotransferase (trpG) , and 
phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase 
(hisA) . 
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TI A niHthod to identify protein sequences that told mtc a knov;n 

thi-ee-dimensi'onal str .cture. 
AH B'-'W:e J U; Lutny P; Eisenberg D 

Co Molecular Biology Institute, Un:Lversity of California, Los Angeles 
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Th*^-' inverse j:r:'tein Loldmg proolem, tne prooleni ot tindmq wniCi. cimmo 

sequences fold mfj a kn^wn three-dimensional (3D) structure, can be 
^1-^,-.}^.^^^ finding sequences tnat are most conpatible with 
rne ^environments of tne residues in the 3D structure. The environments are 
d.-scribed by: ( i the area of the residue buried m the protein and 
i na-'-^s^ibl^ to solvent; (li) the fraction of side-chain area that is 
covered by polar atoms (0 and N) ; and (lii) the local secondary structure. 
Evampl*-s of this 3D profile rr.ethod are presented for four families of 
proteins: the globins, cyclic AMP (3denosine 3 5 ' -monophosphate ) 
-^ceDtor-like pr.jteins, the periplasmi: binding proteins, cind the actms. 
Tnis^m-thod is able to detec: the structural similarity of the actms and 
70- hilodalton heat sncck proteins, even though those protein families 
share no detectable sequence similarity. 

Check Tags: Animal; Comparative Study; Support, Ncn-U.S. Sov't; Support, 
U.S. Gov't, tJon-P.H.S.; Supp^trt, U.S. Gov't, P.H.S. 
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AB Profile analysis measures the similarity between a target sequence and a 
'-rr.up aligned sequences (the prc:be). The probe sequences are 
U3.:^d to produce a position-specific scoring table (tne profile) that can 

aligned with any sequence (the target) using stanuard dynamic 
n.-grarrmuno methods. We are developing a library of proiiles, each 
;.^^-^-rioing'a different structural motif. Tnis allows any target sequence 
— , -apidly scanned f..r the presence of structural :uotifs. Levels of 

^^if^^^an-- for tne cimoarison of target sequences with the prcfile are 
o^rormmea m advance, permitting an ibiective decision to be made as to 
wne-her a protein is liMoly tc: fossess a structural motif. 
CT Check Tags: Animal; Human; Support, uun-u.o. -j-^j^: ..L.ppv.^u, . ^. . 
t:.:ri-F . H . S . ; Sup'port, U.S. Gcv't, P.H.3. 
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^^B Pro-'il- analysis is a metnod for detecting distantly related proteins by 
.,^,^aen-p .-r-mpar ison . The basis f :>r comparison is not only the customary 
Daynoff mut ational-dist ance matrix but also the results^ of structural 
studies and information implicit in the alignments c^f tne 
-^-quences of families of similar proteins. This information is expressed 
In a position-specific sccring t^ble (profile), whi::h is created from a 
-jr'-up :'^f '---quences previously aligned by structural cr sequence 
similarity. The similailty of any other sequence (target) to the group of 
aligned sequences (probe) can he tested by CDmparing the target to 
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diffeio in twc major respects from methods jf. sequence comparison in 
common use: (i) Any number of kncwn sequence;^ car. be used to construct the 
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DT Article 

J^B in pio--n :o.-i re .:oqni- ^c.. . pr.-;:- ^mm. aci:: ceqienc- is^compaied to c 
^iD^-a-y ' :: rr^oresenrat i"-^ :-^:is ■ r nniwn ;■: -oruc: i re to identity a 
:rructar.: n:.>melou . In v .o- c ^;here on., pru-^ anl its h.:>molo:i have ::lear 
-■-na--n-^-- I'v 1 r i v, t r .■;) : ■".nal iesi:iue - :hst : t .it : on r:iaori::es n^v^ be.m 
u;;d^.-. ored.o:. rn^ str,r::tur.l o .:r:u .a r it y . : r. cise. where tne probe js 
s---iuentiaLly distiint from its homeioq, we nave- lev^^oped .:i times r 
-\^x\f-s ■ -^n^--- 7 Miri-s 3; '^J-IL- sut>s ■ i tuti^: n n.-i^rix ^oa^ied ri.;r^;, 
rrb^ulated from a d^tab.s^. of 1 1 V' strucour..l paLra. Memoers eaoh pair 
share c similar fold, bat have s-ooence ro.:-rA:iry L.ss thun 3('V. E,.:ch probe 
-....TiPn-e M^sition is dellned by me of sev.:.n residue classes and -hree 
s;-;3ndary srru7tuie class.... E.cn n ::m :) Ic y,s r:. nd ^o:.it..;)n is iet:ned by 
one o: seven residue cx^.sses, thi^.e secondary struoti^re c.asses,^ ^'^'Z ^l^'Z ^ 

rsii'ial classes. I'hu:'' th"- marrrx i l j e- a ^. .en , ■ .m i i -^^^^.^ ^ ....... ^--^ - 

■rimes ^ tinges / t.m^r.:> .j - -...^'U.Ltr.. - ^. - — ^ ^ 

.ssigninq a prote 3equer:se t^i xt. h-u ...) x ..-^.^ ..-..a^.,. ....^ . . h..>-.. 

T h >"^-.e- ^ fr^-'-lix, sorana, zo:^^: ;-.e<vj'n'a<:i i s^ S-iu'.^L.^-^ ^- '-ii'^ ^ ... , 
:'.;r';';.\h^ nr-ril- based n-mra:. network p-:.ii :tion .:)t sec.no i:y sMucture 
.pHP) ^i.::ram. Th^^n a ayniimic p r " a r ^rmiro? .-dq-ioh:' uses 'he - >si i^^^^-^^x 
to ali^n rd:e Lrcbo sequ^.n :s- ;;itn structures lu a r^ yu^e .= ensa : . e ^ 
IdDiar-. Tc test ne ef ! e i vene- s it the mati ■ x ii cria i j.-nqr nq , lO^d 

class ■■reverse, and cr os.- va ^ idat ^ru t^enciiH.^- 1 ^. ..-^i^-^. ^•^> - - - _ 

^hp'H.p-' matrix the GOUUZT , ?Ao2tr:, BIbGU:^*.C ann a seconaary structure 
only -ubsMtutlon matrix. For distantly relates sequences the Hir'O matrix 
detects riore homologous structur^.s at hiqner re 1 iab i _ 1 1 les t: nan ao^ these^ ^ 
rather substitution matric-s, base^i on sen.:- . t i v t y "e::sus sj:e; : :: 1 1 :i t y p.ots 
{.-:r SEM3-3PEC plots). The added erfioacy of the H3P2 matrix arises from 
Its infirmation on the statisticaJ preferences for vericus 
s.-auencp-st ru^-ture envi rcnm.ent comt iriatior.s from very distantly r^.'laoed 
pr-tems. It introduces the predicte:: secondary structure information from 
a sequence into fold recognition in a stat.stiial way that normalizes the 
innerent correlations between resioue typ-, seconairy structuie ana 
sol vent access ibi 1 i ty. 
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AB 'We introdU':e a riovel Bayesian pr ■Dbaoi list ic nietno^i f:r piredictmg the 

solvent accessioilities of amino acid residues in glibular proteins. Using 
single seguen^^e data, this iriethcd achieves prediction accuracies higher 
than previously published methods. Substantially improved 
predr'.-ti'jns-CL.mp'arable to the highest accuracies reported in the 
literature t 'j date-are obtained by representing al^gnn.ents of the example 
orotems and their homologs as strings of resi-iue substitutiDn classes, 
depen'iiing on the side chain types cljserved at each alignment position. 
These results demonstrate the applicability ^.^f this relatively simple 
Bayesian apprc-ach to structure p. reciction an^:: illustrate the utility of^ 
the classification methodology previously ceveloped tc. extrast information 
from dligned sets of structurally relate'i proteins. 
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CO Protein Science, (1996) Vol.. 5, No. , pp. 94 ^-''^SS. 

ISSN: 0 ^.U-:'n'^8 . 
Dl' Article 
LA Enalisn 

AB in prC'tem to..d recc^qnit i< in, one a-^signs a proh^e ammo acid sequence of 
uriKnown structure t'l. one of a libr.'^ry of target 3D structures. C<jrrect. 
as;-;;i cnm^r^t >:i^ ^[tencis en etf-ctive sc-rmg of the prcbe.' sequence for its 
ci.niif'a: Lhrliry vvitn -r^ich o: the tajrcet otructuie.:. Heie we s:iow that, ...ri 
.iddition to t.ho amino ac.d sequenc*^ <.'f tne ii^r-jh^o, - eqoence -cor i v.:-d 
pr-jf'Oi-t: es of the prc'fie s-.-guence (::.U'-h as t;r,e f u esli c:t e<:i 3e'::':'ndary 
structuie] ere useful in told ass.^ jnr.ent . Tne asidi t j.or.al measure c-f 
or.T-ip.ar ilo 1 i*:y h-E-tween VTem.. and ta:g-t Is ^ne l-yjel of agreement between 
t}'.e t'ledict e<i secondary structure sf the pr'Sfe snc the oncvvn ;:ec^;o.dary 

o. .neat ibi.iity runsti'^n tn^t comfcvi n-s previ-v..sly d';\ebjped :: sm^sat .: b i lit y 
:ur:ctiC'ns (su'.:n as tns- 3o-lD scc-ie^- '. f Ihjw^^L ^--t al. [ 1 9 I ) sr 
s^ 'quer. :e-sequer".ce repi ac^:-i^i<..-n^ tah'.L-rs w istn tn^r ^;reai !::ted .fecc>ndary 
stru::ture of tf.e prz'te se'juence. The erfect er. tola assignm-E-nt ot adding 
fireaicted seC'^ndary strU'Stuie is ev'aPjate'i h.ere hy usirp^ a ji-e^ncniuar k set 
o: prot-ins (Fischer et al., 199ta}. The 3P structure:: ot tne pr-or^e 
s-quences of tne benchmai h are ac-taaPly knS'v;n, t'Ut ar- igne^red Ijy our 
ni-'thsMi. The results show that the inclusior:: ot rh-;^ pu-d{::rod se.':.jnda ry 
s rut:'': ^: a iiti;.) ve s f'.dd .^ssi ■:^njn-e-nt oy aOS).:^ ^ ' . rt.e ::e.-ults .f.iS'::' snC'W 
thdt, if tne true s-r^ss-nda i. y structure ef t:.-r fu'ob- v;ere KTiOvm, eorrect 
fota ..3.jignnont woubi increase by .m addit otn-l d-:! - . We s- n-p-le tnat 
iri'.:<sr[''0'r"at Lng sequenv;e-'i-L r ved pire ii- 't i 'jns ^igiii f i'Sant 1^' irhoroves 
assignrrLont of sequenses ts^ known I'D folds. Finally, we apF-ly tne new 

nLrr-,n'„H,; .3 .-J J. i I j ± .. q'^O. , . ^ ...... . .. i 

assignments ^re givori that are tl-: t detectarie \jy staniard 

3equen:e-3e:piense j : mtja r is'.-n metne-is; f':)r twj of 'hese, trie rol:l is known 
fr':)m X-rny c rys t a 1 P: J raprs/ an.:l the f'-li ass: :;niiient .s sirrev't. 
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AU Bowie, James t' . (1); Lutny, Roland; Eisenberg, David 

CS (Ij Mole..:ular Biol. Inst., Univ. ^Oaltf. at L-os Angeles, 405 Hilgard Ave., 
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SO Go, M.^ [Editor:; Schimmel, P. [Editor]. (1999) op, i> 9^.' - tQ 9 . Tracing 
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II IriVerou fr:tG;r. f jla.^ng Y-': the re^i'.iue pair pre f erenL'^- profile 

me^h'.--l: E-:tiir,uMng the rx-:: r ect nes s ■■ f alignments of structurally 
cc.-np.n^. ^LIm' ire pien'ies. 

Aj Wiliuarms, Ilat-nioS (It; Eisenberg, David 

03 {l: Europ^ean Molecular Biol. Rao., F:eyerhof str . 1, P^stfach 10.2109, 

D- V 1 2 H e _ <:ie 1 Ije r g lAi r ma ny 
SO Protein Engineering, (lOS^t) V<:.1 . , llo. 7, pp. Gj7-o::o 

RT Article 
RA En-;i : \ .-^n 

AB TO'.- resicu-.;- p.jLr f ^re f -rrer.'j^r f'r'ofiRs 'r3P) Tuetnod is .lUi inver-sei Rjldrng 

rn/.t-i--. . ^ csntin-i'S en'/i rc-nrr/i-n t a I r-rC'file-" ano riair" nre f e r^erce rtro'iles, 

Tn- n.etR'i-.l us-s s:atistical preferences toi: residue P'.iirs which score the 
likelihood cf finding a f^rofiled re-oidue t~ i:^e paire<:i with a residue 
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d:nedral uncles, secondary structure ana numrer -jt n'MgnOi:)r . ng resraues as 
a Lun-rtLon c f r--sidue typ-. Racn resioue p':iir p^r-f er'::ice is expr-i-ssed for 
a]; in cijTii.no .iv:ids -.i tlie rirofiled lesicue and is vveR:(hted t y the 
C':'i[toat irii 1 : t y tne en\^ii cnnient res icue witn it;; 'jwrL l-jcal eriv 1 rcinrntiOit . 
The POP Hie*. ho:: }:'rcdurs'S a:\ initial yr':file s-quenc--' ::yignmeno v/nicn is 
then lefinea hiy ccriverting ^he initial rirofile iiit'j ;= p'rofile of a target 
sequence threaded into the structure of tne initial prC'file. VJe have 
tested this meth<jd by evaluating alignments -^f sequer.ces witn kn ov/n 3-R 

f -"j-^-.r-o usin^^ s^iU'^'tuial s uDer^'OS 1 1 i 'jn alignm-ents as re'f^-rence. 
R3P-sequence a±ignments are gtoreq SO'i correct on average fc^r sequences 
whose 3-R structure pairs sut'erimpose with an r.m.s. deviation of lt<:)req 
1.07 AIJO . The average improvement m correctness during this iterative 
refinement is 141 . The RPP-sequence alignments are c.irnpared with 
sequence-sequence and 3-D p^ro f i le-sequence alignments. VJnen all three 
methods are combined, on average gt-oreq 50: '^f tne alignments are C'lrrect 
f jr fjairs of J'-D structures thai superimp'ose within ^.12 AMG . A 3-0 model 
of H:sA is predicted with the combined methoo. 

CO Mathematical Biology and Statistical Methods 04500 

Biochemical Methods - Proteins, Peptides and Amino Acids *10054 
Biochemical Studies - Proteins, Peptides and Amino Acids * 10064 
Biophysics - Molecular Properties and Macr- molecules M.0500 
Biopihysics - BiC'Cyoe r net ics '^lOPlO 

IT Rlaior C<:'ncep'ts 

' Bi.ichemistry ^nd M-olecinar Biophysics; :4eth.jis and Techniques; Models 
arid Oimulatijns COiimputat ional BioRDgy) 

IT M 1 ;v ■ e 1 1 a n e ::ui s : ) e s c r i p) t o r s 

AhPHA-HERIR; ANARVTIOAR METHOD; BETA STRAND; MATHEMATIOAR MODER 

R''8 An:":WER 11 'OF^ 37 BIOoIS OOPYRI 3HT 2001 BIOSIS 
AN lO'^o: 470123 BlOSlS 
DN PREV109t9 34 03423 

TI Rjcal moves: An effi::ient algorithm for simultaneous oi: protein 
folding. 

AU El'jfsson, Arne; Re Grand, ^-.cot M.; Eisenberg, David (1) 

OS (1 ; ')3RA~r'-oE Laf) Stj"uctur3,l Biol. Mol . Med. , 1jO± . B.'isl . Inst, . , Roj-jA, Ros 

Angeles, OA 9:)005-1570 USA 
SO Proteins Structure Function and Genetics, {1993} Vol. 23, No. 1, pp. 

73-82. 
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CT Article 
LA En^giish 

AB We have enhanced genetic alqorithiTLC and Mcnte 'rarle niethc-do f<:'r simulation 
of firotem folding by introducing "Iccil injves" in dihedrii Sf-'ace. A local 
iT.ove corii^ist^ of cnanges in backbone dihe':l::<al angles in rx aequential 
window wriile r h^L- positi'jns of all floras r^utside the window renain 
un::nangH-l. We tind tnree aovantagec of loci :r;oves : {!) For 5-.arie energy 
I uncLlL'i:::;, pi'jtein C'julorni^-A z'n^ jt i ?we r .;nor ^y are foun.i; (..) tneie low 
energy c-^if r.rTri,ir:ions are f'juna m fewer -t'-^p;:'; and {:) th-- s imulat i^iins are 
leas sen-itive tc- tne details or tne iinn-:::ing rorc': :}0'j.l . ":o distinguish 
-h;e ef t^^ :tiv--nes : 'id' i :c'o 1. novo al-:uoth]K rr::'X ■*-ne ::o:!ir le:i i t y of the 
r:ner-gy tuncti^oi, we have used severo:! diMd-rent: energy f -.ko:- i^ns . Tnese 
energy i uric t '.-ns ij .-i. .i.u i*r '-n-.- L-r«jii.i'r .'-C^- ^. ... , , ^i,, 

yt}.: It:,4- 170, : ^ 1) , ^dK- kn^.wle dce-c-:iS--- 1 eneriiy : jn :u: i.^jn used iy Bowie and 
Eisenije r---; I 'i^ :^atL. A'::ai:;. E :.\ . i.:-./v. ^ : 4 4 3 4 -4 4 4 - , l'^a4), t.wc- 

^■rL'=u~gy t-";rms ae'/e L:p)';^(i as s :g':?es -o: i^'j ^Lpc:'! ar:^i C0'.v-j ; te i:.-' ( Kt-ridL icn t:t 
al., J. Mol. r'lol. 1 1 o : 1 07- 180, 19?0), and AMEE^. (Weiner and Kollman, J. 
( ic-iTLp. '.'n^jm. i : 28 7 , I'^Sl; . Bes id-z-s thes^ er.ei^jy runct ^■:'ns v/e have used 
three energy : un ::t i 'Ons th^it include kn^jwleoge of the native sriuctures: 
the RMoO from tne native structure, tne distanc- matrix error, and an 
• -nergy rer:Ti t--jsed on the distan-:e oetween different r^^-^si.iu- tyr>es called 
L)F'IM. In s-:-me ',d" triese s^m jI i^:>ris *. i.-'t ni:i:r\ ^d'r.-j nt a ge jf ] :)-::ai moves i.s 
the redace:! depen^len^ie o\. the :iet^ils tr.e arinealmg s::ne:iu^e. In other 

naii^t :,ons, ! ::o:ui mivos sire superior ti : ther .igorithms is structures 
v/ith lower eriergy ar-- touri':!. 
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TT Verification of protom -tructurer : Patterns of nonbooded atomic 

interact i dvis . 
AU Coiovos, :hri.3; Yeates , Todd O. (1) 

CS {.:. Dep. 'rhen. and Biochem., Univ. Calif., 405 Hilgard Avenue, Los 

Anqeles, ':A -^CC:;!4-1 5^)':^ USA 
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AB A n-:vel rretnod for :i.i.f f erentiat ino between cc^rrectly and incorrectly 

determinel regions o:: prc^tein struct. i.ires bas*:-d on cliara rteri;; t ic atomic 
^ n^'-'i- 5ict i 'oi s L::; f.leo r if e . fiffer-nt t'/o^-s atoms are distributed 
nenrandomiy with re-r-ect to each ■■■tner m pr-. terns. Errcrs in model 
t'l.iiidinL: leao to iull- j .drjf-m^ ^tr;a ■ ..x^- .^r ^ l ^■.■^ .o ^ ^ . . ^ ^ ^-^ . 

t-yp.3.c;^ wbich can f-^- is t i nauished fri^m correct oist r ic>ut ions oy 
statistrcEil nern<:.os. At ditis are class i.f ^c-o ^r. c>:te ot ti.ree ca t egc- ries : 
::url:-'0'n '.C) , riit rL-g-O: {U) , and oxy.ieri ( ij) . Ti.^s leads to six different 
combinations of pairv^ise noncova I ent ly ro-nded inter. ;ict ic-ns (CeU CU, CO, 
Mil, IK), arid '^O) . A quadratic err^.^r fiin'.r.rion .is used to cnara^.:: er i ze one 
set of pairwLse i nr e ract it^ns fio^m n^ne- residue slid ino windov/s in a 
datat-ase of 96 reliable protein s tru ::t ..res . P'e^ions of caniidHite protein 
structures tLar are mistiu-ed or r-iisregister-iai ::3n tnen ce identified by 
analysis of tne patt-rn C' : nonho-eded i at eras* i.^^fis f r oio -osicL oiniiw. 
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TI An evolutionary approach to folding small alcha-hel ical proteins 

triat uses sequence information and an empirical guidma fitness function, 
AU Bowie, James U. (1); Eisenberg, David 

Co (i) Dep. Chem. Eiochem., Univ. Calift.rnia Los Angeles-E'ep . Energy Lab., 
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DT Article 
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AB Three sh-crt protein sequences have been Lfuid':;o by ccmputer t ■;■ fc-l^ls 
resembling their crystal structures. Initial Ly, pepti':ie fragirent 
conformations ranging m size from 9 to 25 residues w^re seler:ed from a 
database of kn-:>wn protein struC'jures. A fragment was --lecte:; it it was 
ctmpatible vMin a seamen*: c: tne sequence t^:' t^e folde:i, as j j ig-:.:d cy 
tnree-dimensional profile scores. By linking the selected fragment 
oonf crmat ions together, hundreds of trial s:ruotures were generateo of the 
same length and sequ-^nce as the r.rotein to be ftlde-.i. These starting trial 
scructur-s were theri impr-tved by an evolutionary i.l jo ri *: hm . Selection 
pressure foi i:nrrc:ving the 3tru::tur-:3 was tr^v.iied by an energy function 
tnat was designed t: guide the c : nf 'jimat i cria I searcn procedure toward the 
correct structure. We find that t-y evcluti'in c-f ori^y -i U J structures ^Dr 
fewer thun 1400 generations, the cverall foil of some small [leJical 
proteins can be computed from the sequence, with deviations from observed 
structures of 2.5-4.0 AUG for C-alpha atoms. 
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t: Computer simulation of antibody bin^llng specificity. 
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A.B A !4onte ^Garlo algorithm that seai'odLT'S for the ojjtimal docking 

CO n tigurat ion of hen egg vjhite lyso::yme to an antioO'iy is develop'ed. Both 
the lyso::yme and the antioc.dy are kept rigid. Unlite the wor.-c of other 
authors, our alg<:a:Lthm does n-jt .ixttenpt to explicitly maximize surface 
c<:'nta.-:t, but minimi ::es th-j OTiergy c omputed using c ^ai se-gra ined p^air 
P'jtentials. Tne Final ref-iiemen: of .-.ur tjest solutions using all-atom OPLS 
P'-jtentiais (Jorgensen and Trradc-Fu ves-F ■ consistent] y yields the native 
C'Onf ormation as ^he pre f e rot erl s:'luti:'n trjr three different ant itc'd ies . We 
find that the use of an exp'Onent lal ^listance-depen dent dielectric function 
is an improvement over the more 'Comm-jnly used linear form. 
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P'Osition-speci f ic scoring tatde (pr'-jfile), wnich is created from a group 
or sequences previously aligned by structural or sequence similarity. The 
similarity of any other sequence (target) to the groups of aligned 
sequences (probe) can be tested by cc^mparing the target t the profile 
using dynamic p'rcgramming algorithrr.s . The p-rc^fi le methc^d ciffers in two 
m.ajor respests from, m.ethods c f sequence comparison in comjnon use: (i) Any 
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Bt'I protein, or fragment, anaJ.O'^ae or variant, tt :t:iel a 

BEd-related iipi'ii cr^anster protein; { ) the ose .jt:mic 

co^-.rdinates 'tf BPI protein to oomput a t i jn :i 1 1 y der:';nn a chemical 

ct.:;. pound tor mimLCking i^Pl protein, or :rigm-:.t, ,o.;;i:que ar 
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p rivii rerat ico-i, i ni-iibit: ion of ang i.'jg^.ii-s i s , an ti n r ^ :.rr. at or y, 
anticoagulant and ant i thr^tmbol y t ic ; (7) ^ met. nod .if —dimensi. anal (3D) 
modelling ^f a BPI protein .icmpiising: ia; prtviding : [■ atomis 
coordinates derived from X-ray ciffr^coion m-asu>-^m. nt s of a BPI 
protein in a computer readable format; 'b; input- mo the data from 
[a: into a oomput^vr witn apprt-pr lat^: scftwire pi n t :.mmes ; [c] generating a 
?[' structural representation <oi tne Br 7 protein cirr-^^ble for 
visualisation ana furtner comp^ut atic^nal rT.ani[>uIat i'.;n; ( s) a netinid of 3D 
modelling of a BP:-ielated lipid transfer protein o...mp'r i s mg : 
(a) providing 3r: atomic cc^ordinates derived trom F-ray oaffractim 
measurements ^af a BPI protein in a computer leadable format; (b) 
inouttinq the data from (a) into d computer v;ith ap pr ^P^^-^te software 
programrr.es; (c) generating a 3D structural r --pr esent a^ ion of the 
BPI-related iipdd transfer protein suitable for visualisation^ 
and further 'romp^utat lonal manipulation; ( '^0 a metht for providing an 
atomic model of a BPI protein or fraqment, analogue or variant, 
comprising: (a) providing a computer readable medium ;CRM) having stored 
on it atomic coordinate/x-ray oiffraction data or tne BPI protein 
, or fragment, analogue or variant, in cryst-^iline form, the data 
sufficient to mc.del the 3D structure of the BPI protein or 

fragment, analogue or variant; (h) analysing on ctmputer, using at least 
one' subroutine executed in the oi-mputer, atomic ':'t.irainate/x-ray 
d:iffracfion data from (a) to provide atomic 'I'lorom^te data tutput 
defining an atomic model of the BPI protein, or rragn'ont, 
analogue or variant, the analysing uti] ismg at least one ccirputmg 
algorithm selected from data processing and reducti'.'n, autc - i ndexing, 
mrensity scaling, intensity merging, arr.pl it utie C'-aiv^ rs ion , truncation, 
molecular- replai:emen t , molecular alignment, molecul-m refinement, electron 
density map calculation, -lectron density mod i r i cat: i -n, electron map 
visualisation, no^iel buil<iing, rigid bo-ay re f ir:em^nt , positiinal 
r-fmement; and (o) obtaining atomic oooiomat- ::at:; .iefming the 3D 
structure of at least cne of oh- BPI protein, 'JI : i ..igti'i-nr , 
analogue or variant; (10} a computer-base;: system f ■. r p rovidin^^ atomic 
m'-:ael data of the 3D structure of BPI protein, cr frigment, 
analogue or variant, a BPI mutant or ^ BPI fragment, comprising the 
f'lllcwing elements: (at at least c^ne Ol-^M having ^.^rie:i or. it atcm.ic 
^-■■-.rdinate/x-rav diffrjctii-n data cf the PPI protein, or 

fr^^qmenr, analogue or variant; (b) at least one deputing subroutine that, 

wnen executed i:i a .iomp»uter, causes tne i:m.putei t: ar.alyse atomic 

-1 jrdinate/x-ray diffraction lata fr::n U) to provide atomic caordmate 

data outp'Ut lefLnino an atomic mtodel if the EPl protein, ir 

fragment, analogue or variant, the analysing utilizing at least one 
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computing 3ubr'juti.ne sele^^ted from: 'iat<'i proof^soina and roduotiC'M. 
aot':)-indexLnj, intonsity ocaling, intenoity mor'jinq, am.po it udo conversion, 
trurication, mC'.lecuLar rop.: acemont , ni:)le<;ul-: r cil i.'jnrient , m^^leoular 
r':*f inement , elect r^'m <.iensity maO' '■oal';-:u Lit i --ri, eiec rc^n ri«-;:.3it:y 
m.:-dif ication, eie-tircai rriafj vi sua i i sa^: ic r., :ro.'dei h)Uilaing, rig..d ;jody 
refinement, posLti-jnal refinc-ment; arid [c) ■:: reorieval uo^v'ice tot. 
(jbtaming atomic C'"'ordinaie output ■J.jta suij.;. canto, a ^ j.y d-i^n^nq too AD 
sr. rucrure c^f the BT'l protein, - i i r -jgiu- nt , on.il' "JU^^ ^■r v.iriant; 
(11) a method t-ir providing ^ o^mj.'Uter ..tomi- m.-dei f jl .Ligand of h FPT 
protein or f ragmeni- , anal'.>gu- oi v-ari..in^ , ''Mrtpr : .oin^j : 
providing a ':F:M ha'/mg st'.ireo or, :t .;rt':'mic c o.vr im. it-/ d-aO.^ of a i^PI 
protein, "^-r fraamer.t, analogu*-, or variant; 'L) ]jro'^oding a C-H 
n^ivin^ stc.red ori i^ 3tomi^ C':o-. r a i na to .:i:ita r .) g'/oera*. e ^r.-m^c mo::ei;-; of 
C'-jtenciai ligan is '^f too protein, : r f r ji:.en*: , ar.ali^-jue, 'jr 

\-aH.ant; (::) analysing on a o^mruter, usin; it i.;^aso -me subr<-'Uoine 
exo-zuzo.i in the :o:n^ojter, th-r iioinio -Oj'jrd-ri it e -la^.i i r-jn { , an:; l:aana 
Liata from {t}, to d-^-teiiaine Lor:ding o^t-os .if oE^i protein, or 
fragment, analO'-jue or variant, E^n-a t :> ^''^'ovi'do at'i-mic oo'jrdincvte -lit-i 
defining an at^innc model of at loz^st ^^n- lignr.d '^f th.e BPI, I.d'l nojtant or 
a fragment, tne analysing utiiisiric^ ■-■oT.ioit ir. f subi'iu*: i nos se::e:t-:':l ric-m 
the group sc-ns i s 1 1 n-4 of d.ita pioo^ssii:^ and re-aU';:t i , auto- nde;-: mg , 
intensity scaling, intensity niergmg, anplioaoe ':o: nv-^i s i.jn, *run:at:.on, 
molecalar r-r-pl a^iemen t , mo ; eco ] r al i ijnm-^nt , mtd o :u L.ir re t i.no.i:i-jnt, electron 
lensity map ca 1 ■-al ^ t ion, elect.r'in density mo^ii i L-::a t l jo, •■;^ec*. l-ou map 
v.sualisat i in, mi^dol buil:iin-:i, ri:(i'i boly retmemeoT:, r ■;■)■- r i en a 1 
refinement; and ;d; ortainlng n^omic i:c-jrdina t o :uc-iei ; ut|::)ut iat-a d-i^fining 
the 3D structure of the at least ■:)ne J i ^an i of the BFl protein, 
or fragment, analo:^, :r variant, and ill) a comput -r-t-a sed system f.-)r 
providing an atomic model of at least -tne ligan-i of i HPi, 3r'I mutant or a 
fragment, ctiuprising the ti'l low i rt .| elen.onts; ( < a t:rM na'/in;; stored en it 
atonic coordinate dat^: 'af a RF'I, mutar.t or fragment; :\-^] a having 
stored on it at'Omic C':'Ordinate data ta generate at.):a^.o mode is jf po::ential 
ligands of a BPI, mutant ar fragment; ^:) at least or:- computing 
subroutine for analysing on a computer, the at-araic c-a'jr-:iinat^- data from 
(a) and (b) , to determ,ine binding sites of BPI protein, or 
fragment, analogue, or variant, and to provide dat.i :autput d-finmg an 
atamic model of at least one potential ligani -at B?l protein, or 
fragment, analogue or variant, the analysing utilising at le-^st one 
computing subroutine selected from data processing and reduction, 
auto-indexing, intensity scaling, intensity merging, amplituoe conversion, 
truncation, molecular replacement, molecular alignment, moletular 
refinement, electron density map calculation, electron density 
modification, electron map visualisation, model building, rigid body 
refinement, pcisitional refinement; and (d) a retrieval device for 
obtaining atomic caC'Tdinate ^'Jata of tne at least one ligand . f a BPI 
protein or fragment, analogue or variant. 

[■JSE - The methcds can be used for molecule mo'ieiling of BPI an i 
related proteins and rational dru^'j design of miimetLts and 
ligands for BPI and for related proteins. They can als-; be used 
to provide m.utants 'af BPI or fragments, analogues or variants 
characterised by one or miore different properties as compared with 
wild-type BPI. These properties include altered surface cnar-ie, altered 
lipid Dindmg pockets, altered specificity or higher a:tivity. They can be 
used to design ct^mp-ounds witn at least one aativity selected from 
antibacterial, ant i fungal, ant imyoobaoterial , antichlamydial , 
antiprotozcan, hej. a r in-binding , endc^to^xin-bin img, hepa r in-neutralising , 
endotoxin-neutralising, inhirution of tumcur and endotnelial :eli 
proliferation, inhifdtion of angio- genesis , anti-inf lamm^atory, 
anticoagulant and ant i thromb a iyt i o . 
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Dim Ni9':o-03'i '^x 9 hnc ■:rj;^3-03o»,6: 

TI Oh.ita.i'rerisiri'-^ the t hree-dimen-; i 'jna 1 structure of a protein - by 
ana] y s i n ■ J -i m 1 n ■ j a c L ^il r e s i d u p o s ] r i ■:■ n s and r omp a r i n with kno n 
protein ;-rructure3. 

DC IH-l 016 T:31 

IN P'jWIE, J U; EISENBERG, D; hUTHY, P. 
P A ( h: K ' - ^C U f ' I. V '3 A L I F' J N ]; A 
CYC 18 

PI W(_' :'30I4;-;'] Al i:^;O0131 a003:.3;'^ EN z--p C0oF0:5-30 ■: — 

FW: A'3 P;E CH b£ :K EE EE -E IT EU MC !:E EE 
W; Afi CA JE 

Ail '.EE^OoE A E-^;O0_ll (1.^:^;—:; COOTOIS-EO 
EE 3-; -0830 A E;'E307E5 (1E03.-3, 2 :p G0-]F'r_9-0r 

ALiT WO ■^3::)143-1 Al WCi 1 E eE -U;J 5 7 3 l'-'OC710; AO ,EEM03E A AE 1EE2-E408E 

1e93(.);I0; dS 54 3G8EE A Cont c^f IE; lEEl-728o4ij l'-0)li)7 11, US 1 94 -E 1 8 68 5 
1994 0 838 

FPT AE 9334083 A Based - n WO 9301484 

PFE\I US 1991-728640 19910711; US 1 994-318685 19940338 

REP US 4704 692; US 4717^:33; US 48EE873; US 4881175; US 4908773; US 4939666; US 

4 94 67 7 H ; 1 j s 4 9 7 6 9 5 8 ; (J S 5 Ci F: 7 3 5 8 
IC I>1 G06F015-20; G06F019-00 

I -^S r 1211015-00; C 1 E C^O 0 1 - 6 8 

CEEarai:terisin-:j the O-drmens ic nal ( 3-Li } struct.ur^e of a protein, 
C'-:'mf:r^:~;es i a.) d-:-to r tm n ing , fron. the 3-D structure jf the protein 
, valuGS lOT a i-t ru(":t uraJ. p roper 1. 1 e.^-: PI , P . . . . Ptl for each amino acid 
r:-esidue P'OSition of the protein, id>; assiqnina each r-esidue 
tne protein to c-ne on'.^ : ronment c^lass based upon the ^^alues for 
the n structural prcperties El, I 3 . . . . Pn for the res..c:ue, tdiereby 
qeneratiFiq a 1 -dirnensio na I enviic.innent strirr j ccrapr i s i ng the en\^ironment 
Class of each resi':iue rn the 3-L' protein structure. 

USE/AE)VANTAGE - Permit the assignment of many amino acic sequences to 
known 3-D structures. Used partic. for screening structural analogues of a 
known protein sequence. The compatibility searches are arte 

to aetect structural relationships that may not be app:arent z-y sequence 
similarity. 
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FS CPI EPI 
FA AB; GI 

MC CPI: B04-B04A; B12-K04; [:'05-H09 
EPI: T01-J10B2 

ABEQ US 54 36850 A UPAB: 19950905 

The three-dimensional structure of a protein is characterised by 
determining values tor n structural properties Pl-Pn for eacn amino acid 
residue, and assigning each resiaue to one of a number of environmental 
cl-cxzs^s based on the values to generate a one-dimensional envir-z^nment 
string coniprising -he class of each residue. The data generated are input 
into a programmed comp-uter which compares them to a database of other 
proteins of known structure and c-utputs an.;"! I'jgous s.tructures. The 
^:rcperties pref. include the total ^rea c-f a residue side-chain I'uried by 
'■■ther protein atoms inaccessiole to solvent, the fraction ot the 
side-chain area covered by polar at'jms c^r water, and the Ic'cal secondary 
St ru ::t ure . 

L-SE/ ADVANTAGE - Partic. tji identifying protein sequences 
vvhicn f:dd intci a known three-dimensional structure. P.elates a 
c ne-dimensional target sequence iirectly tc- known t hree-dimens ic nal 
s^. ru::tures and ef festively utilises infcrmiation abC'Ut the accomiriodat ion of 
sequence changes inherent in kncwn structures, 
[■wg. 1 / 8 



